AITopics | batch value function approximation

Batch Value Function Approximation via Support Vectors

Neural Information Processing SystemsApr-6-2023, 16:41:57 GMT

We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formu(cid:173) lations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradi(cid:173) ent methods, the kernel methods described here can easily'adjust the complexity of the function approximator to fit the complexity of the value function.

batch value function approximation, formulation, support vector, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.67)

Add feedback

Batch Value Function Approximation via Support Vectors

Dietterich, Thomas G., Wang, Xin

Neural Information Processing SystemsDec-31-2002

Virtually all existing work on value function approximation and policy-gradient methods starts with a parameterized formula for the value function or policy and thenseeks to find the best policythat canbe representedinthat parameterizedform. This can give rise to very difficult search problems for which the Bellman equation is of little or no use. In this paper, we take a different approach: rather than fixing the form of the function approximator and searching for a representable policy, we instead identify a good policy and then search for a function approximator that can represent it. Our approach exploits the ability of mathematical programming to represent a variety of constraints including those that derive from supervised learning, from advantage learning (Baird, 1993), and from the Bellman equation. By combining the kernel trick with mathematical programming, we obtain a function approximator that seeks to find the smallest number of support vectors sufficient to represent the desired policy.

formulation, function approximation, function approximator, (11 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon > Benton County > Corvallis (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)

Add feedback

Batch Value Function Approximation via Support Vectors

Dietterich, Thomas G., Wang, Xin

Neural Information Processing SystemsDec-31-2002

Virtually all existing work on value function approximation and policy-gradient methods starts with a parameterized formula for the value function or policy and thenseeks to find the best policythat canbe representedinthat parameterizedform. This can give rise to very difficult search problems for which the Bellman equation is of little or no use. In this paper, we take a different approach: rather than fixing the form of the function approximator and searching for a representable policy, we instead identify a good policy and then search for a function approximator that can represent it. Our approach exploits the ability of mathematical programming to represent a variety of constraints including those that derive from supervised learning, from advantage learning (Baird, 1993), and from the Bellman equation. By combining the kernel trick with mathematical programming, we obtain a function approximator that seeks to find the smallest number of support vectors sufficient to represent the desired policy.

formulation, function approximation, function approximator, (11 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon > Benton County > Corvallis (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.63)

Add feedback

Batch Value Function Approximation via Support Vectors

Dietterich, Thomas G., Wang, Xin

Neural Information Processing SystemsDec-31-2002

One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attemptto minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods,the kernel methods described here can easily'adjust the complexity of the function approximator to fit the complexity of the value function.

Add feedback

Filters

Collaborating Authors

batch value function approximation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Batch Value Function Approximation via Support Vectors

Batch Value Function Approximation via Support Vectors

Batch Value Function Approximation via Support Vectors

Batch Value Function Approximation via Support Vectors